Skip to content

[SPARK-56781][PYTHON] Refactor SQL_GROUPED_AGG_PANDAS_UDF#55808

Draft
Yicong-Huang wants to merge 1 commit into
apache:masterfrom
Yicong-Huang:SPARK-56781
Draft

[SPARK-56781][PYTHON] Refactor SQL_GROUPED_AGG_PANDAS_UDF#55808
Yicong-Huang wants to merge 1 commit into
apache:masterfrom
Yicong-Huang:SPARK-56781

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented May 11, 2026

What changes were proposed in this pull request?

Refactor SQL_GROUPED_AGG_PANDAS_UDF to use ArrowStreamGroupSerializer as a pure I/O layer, moving the per-group pandas conversion and UDF invocation into read_udfs() in worker.py. The custom ArrowStreamAggPandasUDFSerializer is no longer used for this eval type (still used by SQL_GROUPED_AGG_PANDAS_ITER_UDF and SQL_WINDOW_AGG_PANDAS_UDF).

Why are the changes needed?

Part of SPARK-55388.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests. No behavior change.

ASV benchmark comparison (GroupedAggPandasUDFTimeBench, 3 samples per scenario, min taken):

Scenario         UDF                Before     After   Change
few_groups_sm    sum_udf            33.46ms   27.31ms  -18.4%
few_groups_sm    mean_multi_udf     35.42ms   29.24ms  -17.4%
few_groups_lg    sum_udf            52.52ms   41.37ms  -21.2%
few_groups_lg    mean_multi_udf     58.26ms   46.82ms  -19.6%
many_groups_sm   sum_udf          1223.22ms 1012.82ms  -17.2%
many_groups_sm   mean_multi_udf   1295.45ms 1059.40ms  -18.2%
many_groups_lg   sum_udf           350.11ms  285.29ms  -18.5%
many_groups_lg   mean_multi_udf    377.48ms  309.53ms  -18.0%
wide_cols        sum_udf           362.97ms  286.47ms  -21.1%
wide_cols        mean_multi_udf    378.80ms  299.86ms  -20.8%

Was this patch authored or co-authored using generative AI tooling?

No.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant